ART-CEE USER'S GUIDE ART-CEE is a generic inference engine. It was written in MIX C on an Apricot F1, a run-of-the-mill MS-DOS machine. The source and load modules have been successfully compiled and executed on IBM XT, AT and clone machines and should port easily to any MS-DOS machine. The source code also should be fairly easy to convert to other C dialects. Inference engines are a category of artificial intelligence programs. Like traditional databases, they structure information for later retrieval. But they also allow can guide the user through solving problems, point out unnoticed connections within the data and suggest new possibilities. They are capable of working with certainties as well as 'fuzzy' information which is incomplete or tentative. Information is entered into inferences engines as rules. In ART-CEE's case, these rules are "If...then" propositions. Such rules should be closely related and basic to the body of knowledge concerned; the quality of the system built with the inference engine depends most on the accuracy and ade- quacy of these rules. The user should enter enough rules to draw the major connections between the data items, but there is no need to enter everything that there is to know--after all, inference engines are designed to draw their own inferences. Information is retrieved from inference engines through queries. A query is a question or a command which causes the inference engine to search the database, extract the information it knows immediately and/or draw the necessary inferences to come as close as possible to meeting the request, and display both its conclusions and how it arrived at them. ART-CEE has three kinds of queries that allow a broad range of services to you. Each rule in ART-CEE is associated with a percentage of truth, occurrence or certainty. If the rule is always true its percentage is 100; if it is never true its percentage is 0. ART-CEE also uses this number of mark some rules as logically impossible and to prevent the entering of tautologies. These percentages are used to control the query process. GETTING STARTED Before you do anything, make a copy of all ART-CEE files, hide that copy from neighbors, friends, angry spouses, magnetic fields, dust and nuclear explosion. ART-CEE is delivered as four source files. Compile each source file individually. If you are using the MIX system, you may wish to optimize WORKUP2.C for speed. Then link the four object files together and name the output file ART-CEE.COM. The program will run faster if you link the runtime functions with the WORKUP objects, but this will increase the size of the final .COM file to above 60K. If you find that ART-CEE's space requirements exceed what is available, the amount of 'stack' and 'heap' used can be affected greatly by adjusting the value of MAX in the header files. ART-CEE is supplied with a value of 60 for that variable. All source files use carry exactly the same value for MAX. Locate the following files in the same directory on the default drive: ART-CEE.COM, HELP1.AIH, HELP2.AIH, HELP3.AIH, HELP4.AIH. If you chose not to link the MIX runtime functions with the WORKUP objects, the MIX C library functions also must reside in that directory, unless a PATH command has been executed. The runtime module also must have access to the MS-DOS i-o routines on the system disk; screen blanking will not succeeed if these routines are not available on the default drive or through a PATH command. MAIN MENU AND PROMPT The main input screen displays a menu of all available input options. Commands and default settings may be addressed by entering the single letter in the appropriate menu phrase. For instance, to load a data file, enter "L". Usually the letter to be entered is the first letter in the phrase, but in any case it is the only capitalized letter. Do not enter the full word or phrase; ART-CEE will try to interpret that entry as a rule or subject. ART-CEE will convert your inputs to all-capitals; the program will execute more quickly if you enter them as capitals in the first place. All inputs to ART-CEE must end in the "Return" or "Enter" key. Beginning with version 1.4, inputs are processed without regard to case. Rules and queries demand a more involved input. Any input at the main prompt that is longer than one character will be treated as a rule or query. RULES All rules must begin with the word "IF " and contain the word " THEN ". Between these words must be a subject, and after the " THEN " must be a predicate (apologies to English teachers who know that such phrases are not grammatical subjects and predicates). The entire rule cannot be longer than eighty characters. Neither subject nor predicate should contain punctuation. It follows that the rule should not end in punctuation (the reason for this is that later matching against the predicate would have to include the same punctuation, which probably would be incorrect for that usage and a mess to remember). Any leading grammatical article ("A ", "AN " or "THE ") in sub- ject or predicate will be ignored. ART-CEE searches the database on each rule input to determine if it already knows the subject or predicate. If it finds that the rule already exists, you are given the opportunity to enter new percentages of occurrence for that rule. Otherwise, if there is room in the database for the new rule, the rule is added to the knowledge base. If override of default percentages is turned on, you are asked to input forward and reverse percents. Otherwise the defaults are used. ART-CEE can handle up to MAX number of different subjects and/or predi- cates. Internally, subjects and predicates are stored identically, as the subject of one rule likely will be the predicate of another. The maximum number of rules that can be handled is MAX * (MAX -1). A forward percentage of occurrence refers to the percent of time that the rule is true as entered. If fifty percent of all humans are female, then the rule "IF HUMAN THEN FEMALE" would have a forward percentage of 50.00000 (trailing zeroes need not be entered). Reverse percentages refer to the percent of time that the rule is true in reverse format. Using the example above, if one percent of all females are human, the reverse percentage for "IF HUMAN THEN FEMALE" would be 1.000000. Percentages must be not less than zero (zero means "never true") and less than one hundred (one hundred means "always true"). ART-CEE stores impossible rules with negative percentages of occurrence, but you cannot enter them at rule entry time (see commands B and G). QUERIES Three query formats are available in ART-CEE: simple query, two-element query and thinking. The simple query is a request for all rules concerning just one subject in the database. It reports only those rules which contain the subject as subject or predicate; no inferences are drawn. Only positive rules are reported; if a potential rule has not been entered, or if the rule is marked as impossible, it is not reported. Simple queries are entered by entering any of the following phrases, followed by the subject: WHO, WHO IS, WHO IS A, WHO IS AN, WHO IS THE, WHAT, WHAT IS, WHAT IS A, WHAT IS AN, WHAT IS THE, DESCRIBE, DESCRIBE A, DESCRIBE AN, or DESCRIBE THE. The subject may be followed by a question mark, which is ignored. ART-CEE searches the database for an exact match on the subject. If the subject is found, all forward rules for that subject are reported to the computer monitor, followed by all reverse rules. The two-element query asks is a certain rule is true. Using the above example, to find out what percentage of all females are human, the input would be: "IF FEMALE THEN HUMAN?" Note that the input is exactly like the input for the entering of the rule, except that the rule ends in a ques- tion mark. All parsing rules applicable for rules are applicable for two- element queries. If subject and predicate are in the database, the query begins. If the rule already exists in the database it is reported, and the query search ends. If the rule does not exist (ie., is marked as having a zero percentage of occurrence), ART-CEE attempts to find any way possible to chain together enough inferences to draw a conclusion about the rule. For instance, suppose that ART-CEE does not know directly how many females are human, but it does know the following rules: IF FEMALE THEN MAMMAL 20% IF MAMMAL THEN HUMAN 3% ART-CEE will link these two rules together and conclude that the rule "IF FEMALE THEN HUMAN" is true 0.6% of the time (20% times 3%). It reports the chain that it used to draw this conclusion on the monitor and asks if you agree with each step in the chain. If you disagree with any step, the entire chain beginning with the part that you rejected is abandoned, and the search continues. If you agree with the entire chain you will be asked if you wish to add the new fact ("IF FEMALE THEN HUMAN" 0.6%) to the data- base and complies with your decision. The search resumes to find other ways that FEMALE and HUMAN can be linked until all possible links have been examined. The two-element query also can work with incomplete data. Sometimes it is possible to make a connection between two subjects only if one additional rule is assumed to be true. For instance, suppose that the following rules are known: IF FEMALE THEN MAMMAL 20% IF MAMMAL THEN TWO-LEGGED 3% IF TAILLESS THEN HUMAN 10% ART-CEE cannot conclude IF FEMALE THEN HUMAN unless one of the following assumptions is made: FEMALEs are TAILLESS, MAMMALs are TAILLESS, TWO-LEGGED implies TAILLESS, MAMMALs are HUMAN or TWO-LEGGED implies HUMAN. By setting the default number of assumptions on the main menu (function A), the number of assumptions which will be included in each attempt to chain subjects to- gether is controlled. Changing the number of assumptions to 1 allows 1 such assumption per attempt, etc. The maximum number of assumptions allowed in any one chain is MAX - 3. Increasing the number of allowed assumptions increases the power of the two-element query, but is also geometrically in- creases the effort both program and user must exert to get through the query. If the number of assumptions is nonzero ART-CEE may request information from the user at what seems to be odd moments. When the user is asked to agree to the assumption as drawn, a "Y" response will result in the new rule's addition to the database, and the user will be asked for the percentage of occurrence for that rule, whether or not the override of defaults is on. The final query form, "thinking", is an automated extension of the two- element query. Every subject in the database is chained to every other sub- ject in the database. Before the query is executed, ART-CEE goes through the database marking all logical impossibilities. These impossibilities take the form of: IF A THEN B cannot be true (ie., IF A THEN B has a negative percentage) IF C THEN B is always true (ie., IF C THEN B has a 100% percentage) Therefore, IF A THEN C can never be true. Then ART-CEE begins chaining. Only rules with positive percentages are in- cluded in the chains (ie., no assumptions are drawn). All chains are extended as far as they will go, up to the think depth setting shown on the main menu. That is, if the think depth setting is 3, then no chain will be extended be- yond three subjects. The minimum depth setting is three, and the maximum setting is MAX - 1. While increasing the depth setting increases the prob- ability of finding all possible inferences, it also greatly increases the time necessary to perform the query. Seldom does a depth of more than four or five prove efficient. All connections drawn in the "thinking" function are applied to the database, and the greatest percentage found for each rule is the final one saved in the database. No impossible rules are changed. COMMANDS A brief discussion of each rule follows, each prefixed by the single- character entry that invokes the rule. A Set number of assumptions that will be allowed in any single chain in the two-element query. Minimum number is 0; maximum is MAX - 3. B Enter a file containing mutually exclusive subjects, and mark all occurrences of those subjects in the database as mutually exclusive. For instance, assume that a file contains the following facts: DOG, CAT, PIG, HORSE, COW. Assume also that the database contains the subjects DOG, CAT, HORSE, COW. This rule will mark the follow- ing rules as impossible: IF DOG THEN CAT IF DOG THEN HORSE IF DOG THEN COW IF CAT THEN DOG IF CAT THEN HORSE IF CAT THEN COW IF HORSE THEN DOG IF HORSE THEN CAT IF HORSE THEN COW IF COW THEN DOG IF COW THEN HORSE IF COW THEN CAT The PIG item in the file is ignored. C Change a subject without changing any percentages associated with it. For instance, the subject "PIG" could be changed to "SWINE". Then all rules that referenced "PIG" will now reference "SWINE". D Drop a rule. Suppose that the database contains the rule "IF COLLIE THEN DOG". By selecting this option, the rule can be marked as having a zero percentage of occurrence. If this is the only rule referencing COLLIE, then the subject COLLIE is erased. The same would occur to DOG is this was the only rule referencing DOG. F Set default forward percentage of occurrence. Valid values for this default are not less than zero and not greater than one hundred. This setting can be used to great advantage if a large number of similarly- occurring rules are to be entered at the same setting. Set the default appropriately, set the default reverse percentage as well, turn off the override option, and just enter the rules. All prompts for per- centage inputs will be skipped. G Enter a group of mutually exclusive subjects from the keyboard. The function works the same way as "B", except that the keyboard is the source of information. A single letter "E" ends the input stream and begins the marking of subjects. Any number of subjects can be en- tered, but only those actually found in the database will be stored. After the subjects are marked, you will be given opportunity to save the group for later use as an input file under option "B". H Help screens are available online. These screens are an abbreviated version of this user's guide. The function will work only if the help files are located in the default directory on the default drive or an MS-DOS PATH command was issued prior to entering ART-CEE. I Initialize the database. This function erases all subjects in the databased marks all percentages of occurrence as zero except for tautologies (IF A THEN A), which are marked as impossible. K Set depth of thinking chaining. See the discussion of thinking under queries, above. L Load a data file from disk. The database developed in a previous ART-CEE session can be reloaded using this function. If the file con- tains fewer than MAX subjects, the data in the file will overlay the contents of the database now occupying the positions from which the file was written and leave the remainder of the database untouched. If the highest-numbered subject in the file exceeds the current value of MAX, the file cannot be successfully loaded. M Toggle the showing of the main menu. If the current setting is "Y" (show the menu), choosing this function will change the setting to "N" (do not show the menu). Once the commands and query/rule format are well-known, significant time can be saved by switching the setting to "N". O Toggle the overriding of default percentages of occurrence. If the current setting is "Y" (yes), choosing this function will change the setting to "N" (no), and vice versa. P Print the database. All rules in the database will be copied to the printer (LPT1), together with their percentages of occurrence. These rules will be grouped by subject, all forward references first and then์ reverse references. Therefore, all rules will be printed twice. If all subjects are used and all subjects have a completely filled set of rules, the number of lines printed will be MAX * 2 * (MAX + 2). R Set the default percentage used for reverse references. Valid values are not less than zero and not greater than one hundred. S Save the database to disk. The database can be saved in whole or in part. To save just part of the database, enter the starting and ending positions when prompted (use function V to determine what subject is in what position). If the present database was loaded from disk, the filename from which the database was loaded will be offered to you as a default, otherwise the filename "ART-CEE.DAT" will be offered. The default can be overridden by entering any other name when prompted. If a file by the chosen name already exists, it will be overwritten without backup. T Think. See discussion under "QUERIES". V View the database. All used subjects are listed in order of entry, followed by the number of forward references and number of reverse references for each. X Exit the program. You will be asked if you wish to save the database before the exit occurs. The default is not to save the database. ART-CEE AND DATA STRUCTURES Like all software, ART-CEE has to deal with four possible relationships within the data it handles. The simplest and least significant is the one- to-one relationship. An item in the database has just one connection with one other item in the database, and that's all. "IF A then B" defines such a relationship. If "A" is true then "B" is always true; if it were other- wise, the rule would be just one part of a larger set of truths, and "A" would connect with at least one more item besides "B". The one-to-many relationship involves multiple possibilities rising from the same item. "If A then B" is true part of the time, and "If A then C" also is true part of the time, but "B" and "C" cannot be true at the same time. The many-to-one relationship is created by such combinations as "If C then A" and "If B then A". Two items have the same kind of relationship with another item in the database; they lead to the same conclusion. This relationship is not the same as "If C then if B then A". Finally, the many-to-many relationship combines the one-to-many and many-to-one relationships. The following rules constitute a many-to-many relationship: If A then B. If A then C. If C then B. If B then C. The relationships between items intertwine. Expressing one-to-one relationships with ART-CEE is straightforward. Just enter a new rule. However, unless more rules are entered, transforming the one-to-one into one of the other types, the one-to-one relationship amounts to little more than a redefinition of the subject. One-to-many and many-to-one relationships are created by entering several rules having items in common on the 'many' side of the relationship. The grouping functions (commands 'B' and 'G') are then used to mark these common elements as having a mutually exclusive relationship with each other. The result is a hierarchy of information that can be diagrammed as a triangle: A B C / \ \ / / \ or \ / B C A Use of the grouping functions assures that all relationships in the above diagrams are vertical ("If B then C" and vice versa cannot be true without transforming the relationships into the many-to-many kind). This structure is appropriate for classification schemes, diagnosis patterns and family trees. Many-to-many relationships form a more nebulous pattern in which each element theoretically can rise from and lead to every other item. Such structures can be used to identify patterns in which statistics are known about the data but overall structure of the data is unclear. Many-to- many patterns can also be used to interrelate two or more hierarchies. The big factor with ART-CEE is that ART-CEE just loves to transform hier- archies (many-to-one and one-to-many relationships) into many-to-many forms. The drawing of inferences in the way that ART-CEE does this transformation. If you have build a hierarchical structure and want to keep it that way, be careful to do the following: 1. Do not use the 'think' function against your permanent database. 2. Do not add query findings to the database. 3. Do not use assumptions. 4. Use the 'grouping' functions completely. If you wish to break any of the above, be sure to save the database first, then do not save the database when exiting from ART-CEE. Of course, if you are working with a nebulous, highly interconnected database to begin with, feel free to infer to your heart's content.